relative gradient
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
- Asia > Japan > Honshū > Tōhoku > Iwate Prefecture > Morioka (0.04)
- North America > Canada (0.04)
- (5 more...)
). Reviewers also praised the novelty
We thank the reviewers for their comments and the largely positive feedback. Reviewers agree that " the paper clearly The improvement our approach provides " is demonstrated by experiments " The contribution was praised as " elegant ", Rigorous formulation and convergence properties of relative gradient: We will add more details on this. We will include these references in the paper. These architectures have several limitations, e.g. they We will include this discussion and reference in the paper. R6: T oo much emphasis on existing concepts, too little on the proposed approach: We will try to balance this.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
- Asia > Japan > Honshū > Tōhoku > Iwate Prefecture > Morioka (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- (5 more...)
Relative gradient optimization of the Jacobian term in unsupervised deep learning
Gresele, Luigi, Fissore, Giancarlo, Javaloy, Adrián, Schölkopf, Bernhard, Hyvärinen, Aapo
Learning expressive probabilistic models correctly describing the data is a ubiquitous problem in machine learning. A popular approach for solving it is mapping the observations into a representation space with a simple joint distribution, which can typically be written as a product of its marginals -- thus drawing a connection with the field of nonlinear independent component analysis. Deep density models have been widely used for this task, but their maximum likelihood based training requires estimating the log-determinant of the Jacobian and is computationally expensive, thus imposing a trade-off between computation and expressive power. In this work, we propose a new approach for exact training of such neural networks. Based on relative gradients, we exploit the matrix structure of neural network parameters to compute updates efficiently even in high-dimensional spaces; the computational cost of the training is quadratic in the input size, in contrast with the cubic scaling of naive approaches. This allows fast training with objective functions involving the log-determinant of the Jacobian, without imposing constraints on its structure, in stark contrast to autoregressive normalizing flows.
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
- Europe > France (0.04)
- Asia > Middle East > Jordan (0.04)
- (5 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)